Towards the Use of Situational Information in Information Retrieval
نویسندگان
چکیده
ors' preference for middle hierarchy labels supports theoretical work done in classification of other objects by Rosch and colleagues [19] who found that within taxonomies there is generally one level of abstraction at which most individuals tend to make category cuts. This is referred to as the basic level and it tends to occur in the categories at the middle of the hierarchy. 2.3 Automatic recognition of structure Liddy's work laid a foundation for the incorporation of the discourse-level structure of documents in information retrieval system databases. For abstracts of empirical research papers, at least, this structure exists in the minds of those who write them, and is associated with a number of recognisable linguistic clues. However, the step to a fully automatic structure recognition program is not a small one, and thus exploration of this problem is an important part of the present study. We continued to use Liddy's corpus of abstracts in order to take advantage of her substantial investment of effort in marking up all the texts with their Elaborated structure. The clues to discourse-level structure seem 1. to span a full range of levels (from morphological to lexical to relationships between structural components); 2. to be probabilistic; 3. to interact with each other; and 4. to be very numerous, even within limited domains. Lexical clues are essentially templates. For example: 'indicate that ...' usually introduces the results; 'appeared to ...' sometimes signals an interpretation of a result. Morphological clues can also be used: for example, tense endings of verbs can sometimes distinguish between well-founded statements (results of past research) and speculation (hypothesis). Other clues are semantic classes (e.g. animate nouns often indicate experimental subjects, a subcomponent of methodology), and expected orderings of components within the text. Another aspect of the structural analysis is to establish the scope of a component how much of the text should be included in it. Sources which are useful in these decisions are both 'cohesion' clues which indicate the flow from one component to another, and 'continuation' clues which indicate that a sentence is a continuation of the component started in the previous one. If we were to assemble a set of deterministic rules for this analysis, we would expect a very large number of complex rules, most of which were needed to deal with exceptions to simpler ones. Moreover (extrapolating from experience in linguistics and information retrieval), there would probably be a 'diminishing returns' situation, in which the addition of more complexity produced very little gain in effectiveness. Another approach is to view rules in this domain as probabilistic. They contribute evidence toward the acceptance of hypotheses about the components of a text. The processing of the text terminates, in successful cases, when the overall likelihood of a candidate structure achieves dominance over others. The goal is to evaluate many alternative hypotheses concurrently, using a multitude of rules about clues. Two approaches to programming the required text analysis, which are appropriate to this situation, will be given here. The first is a probabilistic method, making use of Bayes' theorem to incorporate the evidence afforded by lexical and other features. The other is a 'neural network' approach which, in principle, allows the many different factors simultaneously to influence the outcome of the analysis. 2.3.1 Probabilistic text analysis The clues to discourse-level structure of empirical abstracts, which seem to be at the same time the most important and the most straightforward, are single words or stems occurring in the text. We consider an abstract, A, to be divided into a sequence of text-pieces: A1, A2,... An. Each text-piece Ax is composed of one or more words, some of which are potential clues, such as those identified by Liddy [17]. We suppose that Ax can belong to exactly one of the thirty-seven possible components, C1, C2, ...C37, in the Elaborated model. (As we shall see later, this is a somewhat problematic assumption.) We shall denote the potential clues for a component Cj in the text-piece Ax by e1, e2, ...er,where there is no constraint on the e,s to be unique. The identification of a set e1, e2,... er in Ax is an event that we shall call E. Our problem is to estimate, for each Ax, the thirty-seven probabilities, P(CJ|E), that Ax belongs to each component, Cj(1 ≤ j ≤ 37). Applying Bayes' Theorem, we have: P(Cj|E) = P(Cj) x P(E|Cj)/P(E) In this expression, P(Cj) is the prior probability that a text-piece is contained in component Cj, P(E|Cj) is the probability that the event E occurs, given that the text-piece Ax is in component Cj, and P(E) is the probability of event E occurring. If we assume that the clues comprising E, namely e1, e2, ...er occur independently of one another, then: P(E|Cj) = P(e1|Cj) x P(e2|Cj) x ... x P(er| Cj) which can be estimated by: where ni is the number of occurrences of word ei in all components of type Cj, and Ni is the frequency of ei in all abstracts, both taken from the corpus of 276. Our estimate for P(Cj) is the proportion of components in the corpus which are of type Cj. The values obtained from our data for this prior probability are given in Table 2. Given the mutual exclusivity assumption above, P(E) is the sum of the joint probabilities P(Cj).P(E|Cj)s. To incorporate clues of a different kind, we can treat the posterior probability, P(Cj|E), as a new prior probability, and go through a similar process to produce a new posterior probability. For example, we know that components do not occur haphazardly in an abstract, but tend to occur in certain orders. This information can be incorporated to, hopefully, improve the probabilities. This order information is not clear-cut, but can be captured to some extent by observing the tendencies of components to occur in grossly defined regions of the abstract. We chose to divide the abstract into three approximately equal parts in terms of word count, and compiled a table of frequencies of occurrence of each component type in each third (see Table 2).
منابع مشابه
Health Information Seeking Behavior of Graduate Students Linked to Corona Virus at Qom University
Objective: Health information on diseases could help prevent the spread and the treatment and is the most vital needs of people in daily life. One health issue that has plagued the world in recent years is the corona virus. Therefore, the main purpose of this study was to investigate the health information behavior of graduate students at Qom University. Methodology: Applied descriptive survey...
متن کاملAssessing the level of familiarity, use and also the effectiveness of mind maps in the information retrieval process
Background and Aim: Mind map is a full-color illustrated note-taking in which, main idea or subject is situated. The main ideas then branch out from the center, which are linked to the central idea. This is a relatively new topic, and slight research has been conducted to show its effectiveness worldwide. The aim is to examine the effectiveness of mind maps in the information retrieval process....
متن کاملAnalysis of the Therapists’ Information Behavior in the diagnosis and treatment of mental disorders based on Kuhlthau's information retrieval process model
Background and Aim: Under the influence of various factors, people use different methods and methods to obtain information and express different information behaviors. These behaviors have been introduced in the form of patterns and models of information retrieval by information science experts in recent decades, which can be used in various fields. One of these areas that almost all people are...
متن کاملMedical Informatics: Concepts and Applications
Medical Informatics is a developing body of knowledge concerned with the use of information and communication technology in support of medical research, education and also for promoting health care delivery. The field focuses on the biomedical information, patient data, and also acquisition, storage, retrieval and optimal use of information for problem solving and decision making. The goal of m...
متن کاملThe socio - cognitive theory in information retrieval (IR)
Abstract Background and Aim: The socio-cognitive theory introduced in information science by Horland and Alberchtsen. The socio-cognitive view turns the traditional cognitive program upside down. The socio-cognitive theory emphasizes on different cultural and social structures of users. Hence, the aim of the article is to explain the role of socio - cognitive theory in information retrieval (I...
متن کاملدیداری کردن نتایج جستوجو در فرایند بازیابی اطلاعات
Purpose: One of the most effective ways to achieve optimum information retrieval is through visualization of Information. Search strategies, probing skills, querying of information needs and analysis of information play a significant role in the accessing of necessary and useful information. Besides the factors mentioned above, information visualization can increase the availability level of in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Documentation
دوره 48 شماره
صفحات -
تاریخ انتشار 1992